[Improvement-18039][Metrics] Add missing metrics for workflow and task state transitions#18038
[Improvement-18039][Metrics] Add missing metrics for workflow and task state transitions#18038sanjana2505006 wants to merge 1 commit intoapache:devfrom
Conversation
990a504 to
6176e33
Compare
SbloodyS
left a comment
There was a problem hiding this comment.
Please follow the pull request notice and create an issue first. @sanjana2505006
9c47397 to
c0013cc
Compare
|
Please check the failed CI. @sanjana2505006 |
|
Sure, @SbloodyS, I have an exam today. I’ll take a look at the failed CI once I’m done 😃... |
2a2647b to
edeef4e
Compare
|
Hello @SbloodyS, I've updated the PR to address the CI failures. Please have a look when you have time and share your feedback. Thank you! |
|
UT still failed. @sanjana2505006 |
ruanwenjun
left a comment
There was a problem hiding this comment.
Thanks for your PR.
It might be better to add event metrics first.
WorkflowEventBusFireWorker#doFireSingleWorkflowEventBus
We may need to provide a public method to handle changes in instance state, and then use listeners or similar mechanisms to update the metrics.
So it’s not suitable to add this directly at the moment, but I’ll be doing that soon.
| // The task instance is not active, means it is already finished. | ||
| return; | ||
| } | ||
| org.apache.dolphinscheduler.server.master.metrics.TaskMetrics.incTaskInstanceByState("timeout"); |
There was a problem hiding this comment.
| org.apache.dolphinscheduler.server.master.metrics.TaskMetrics.incTaskInstanceByState("timeout"); | |
| TaskMetrics.incTaskInstanceByState("timeout"); |
There was a problem hiding this comment.
I've removed this entirely.
And have deleted this direct metric call and moved the timeout tracking logic to the event bus worker instead
| taskInstance.setState(TaskExecutionStatus.FAILURE); | ||
| taskInstance.setEndTime(taskFatalEvent.getEndTime()); | ||
| taskInstanceDao.updateById(taskInstance); | ||
| TaskMetrics.incTaskInstanceByState("fail"); |
There was a problem hiding this comment.
| TaskMetrics.incTaskInstanceByState("fail"); | |
| TaskMetrics.incTaskInstanceByState(TaskExecutionStatus.FAILURE); |
There was a problem hiding this comment.
All of these direct metric calls inside the state actions have been deleted. The failure tracking is now natively handled by the event bus worker alongside the other task events.
…k state transitions This PR adds missing metrics for task and workflow execution into DolphinScheduler metrics. It also adopts an event-driven mechanism to track Task metrics cleanly upon Task event bus fires.
f4919d7 to
e30a3d3
Compare
|


Purpose
This PR adds missing metrics to the Master module to improve visibility into workflow and task execution states. It addresses the gap where several metrics classes were defined but not utilized in the core execution flow.
Proposed Changes
AbstractWorkflowStateAction.AbstractTaskStateAction.TaskTimeoutLifecycleEventHandler.WorkflowExecutionRunnableFactory.Verification
mvn spotless:apply.This PR closes #18039